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Cuando la ciencia cuenta tanto como lectura y matematicas: un estudio de las pollticas 
estatales diferenciadas de rendicion de cuentas 

Resumen: Aunque solo resultados de las evaluaciones de matematicas y lectura se requieren para 
calcular el Progreso Anual Adecuado (por sus siglas en ingles, AYP) de las escuelas, algunos estados 
han optado por incluir los resultados en el area de ciencias, ya sea en sus calculos de progreso anual 
adecuado o como parte de un sistema doble de rendicion de cuentas. Este estudio examino los 
resultados para 2009 de la Evaluation National de Progreso Educativo (por sus siglas en ingles, 
NAEP) en funcion si los estados utilizaron o no los resultados en el area de ciencias en sus 
programas de rendicion de cuentas. Se tuvo en cuenta la idea de que como los esfuerzos en el area 
de ciencias podrian disminuir los esfuerzos y por tanto los resultados en las matematicas y la lectura. 
Los resultados tanto en cuarto y octavo grados indican que los estados que optaron por utilizar la 
ciencia en sus evaluaciones no perdieron terreno en los otros temas. Ademas, para el cuarto grado 
los datos indican que los estados que utilizaron datos del area de ciencias en sus programas de 
rendicion de cuentas tuvieron logros significativamente mayores en ciencias que otros estados. 
Palabras clave: rendicion de cuentas; evaluation; polltica educativa; education cientifica; evaluation 
a gran escala 

Quando a ciencia conta tanto como a leitura e a matematica: um estudo de pollticas 
publicas diferenciadas de presta§ao de contas 

Resumo: Embora apenas os resultados para leitura e matematica sejam necessarios para o calculo de 
Progresso Anual Adequado (a sigla em Ingles, e AYP) de escolas, alguns estados optaram por incluir 
os resultados das ciencias, em seus calculos de AYP ou como parte de um sistema dual de presta^ao 
de contas. Este estudo analisou os resultados de 2009 da Avalia^ao National do Progresso 
Educational (a sigla em Ingles e NAEP), dependendo se os estados usaram ou nao os resultados na 
area da ciencia em seus programas de avalia^ao. Considerou-se que os esfor^os em ciencia poderiam 
enfraquecer os esfor^os e, portanto, os resultados nas areas de matematica e leitura. Os resultados 
tanto na quarta quanto na oitava serie indicam que os Estados que decidem usar a ciencia em suas 
avalia^oes nao perderam terreno em outras disciplinas. Alem disso, para a quarta serie, a evidencia 
sugere que os estados que utilizaram dados da area da ciencia em seus programas de avalia^ao 
tiverem resultados na area de ciencia significativamente maior do que outros estados. 
Palavras-chave: presta^ao de contas; avalia^ao; polltica educacional; educa^ao cientifica; avalia^ao 
em grande escala 

Introduction 

In the decade that has followed the passing of the No Child Left Behind (NCLB) Act, 
educators, policymakers, and researchers have often referred to a resulting narrowed curriculum. 
Typically, a narrowed curriculum refers to an overconcentration on the content areas of 
mathematics and reading and a diminished focus on other subjects, such as science and social 
studies. This is commonly believed to occur because achievement results from mathematics and 
reading contribute directly to the calculation of Adequate Yearly Progress (AYP) for schools and for 
local educational agencies (LEAs). When schools or LEAs miss AYP in successive years in the same 
area, such as mathematics achievement, then the institution is subject to intervention from the state 
department of education and if AYP is missed in the same area for six consecutive years, invasive 
actions such as removal of a school board or replacement of staff may occur (Northwest Regional 
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Educational Laboratory, 2004). Other measures such as the percentage of students tested, high 
school graduation rates, and attendance rates, also contribute to AYP calculations, but it is student 
achievement results from mathematics and reading that are the primary reasons schools and LEAs 
miss AYP (Hoff, 2009). 

Criteria for the use of mathematics and reading achievement results were laid out in the 
NCLB Act. A chief objective for the designers of NCLB was that 100 percent of students would 
meet or exceed grade level expectations in mathematics and reading by the 2013-14 academic year 
(No Child Left Behind Act, 2002; Spellings, 2007). Regarding science, the NCLB Act specified that 
science standards were to be in place by the 2005-06 school year and, by the 2007-2008 school year, 
states were to have in place science assessments to be administered at least once during grades 3-5; 
grades 6-9; and grades 10-12 (Gross et al, 2005). Thirty-one states administer science assessments to 
only the three grades minimally required. Mathematics and reading assessments are however 
administered to at least seven grades in all fifty states. Significant to this study, although it is a 
requirement for the states to have science standards in place and to assess science, there is no 
stipulation within the NCLB Act requiring states to use the results from their science assessments as 
part of accountability calculations. Despite there being no federal requirement to include science 
achievement into accountability calculations, eleven states have chosen to do just that. 

To take the figure of speech of the narrowing curriculum further, the author suggests the 
image of an accountability filtration system (Figure 1). On the left side of the accountability 
membrane it is envisioned there is an extensive curriculum scope where all content areas are 
represented equitably. However, the accountability membrane allows primarily just the mathematics 
and reading content to easily pass. The fraction of a content area present to the left or to the right of 
the membrane would perhaps be proportionate to the amount of effort dedicated to teaching and 
learning that particular subject. In this simplistic model it is predicted that student learning is 
balanced on the left side of the membrane where effort is distributed among the subjects and 
equivalent achievement is occurring. To the right of the membrane it may be believed that greater 
concentration of the two subjects of mathematics and reading leads to focused effort by teachers 
and students and subsequently greater achievement in these two most tested subjects. In this sense, 
the curriculum hasn’t been narrowed as much as it has been restricted. 




Figure 1. The Accountability Filter 
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However this filtration system operates differently among the states. In some states science 
is passing through the accountability membrane in addition to mathematics and reading. That is, a 
few states include science achievement directly in their accountability calculations alongside 
mathematics achievement and reading achievement. If a narrowed, or filtered, curriculum does lead 
to improved results in the subjects that pass through the accountability boundary, then there might 
be concern that achievement becomes more diffused in those few states where more than just two 
subjects are allowed to pass through. This might be a particular concern in elementary schools where 
greater latitude is often allowed regarding how much time and effort is spent on each subject. That is 
to say, if elementary teachers must be held accountable for three, instead of two subjects, the 
concern can immediately be that in order to attend to science, there will be a loss in mathematics or 
reading, or both. 

This concern for how science is attended to in classrooms and how that attention may be 
affected by state accountability policies does not simply result from a sentiment for fairness among 
the subject areas, but stems from broader national interest. Various national-level policy documents 
have promoted the importance of supporting effective science education programs as a means to 
strengthen national pride and invigorate the United States’ global competitiveness (e.g.. National 
Governors Association, 2007; National Research Council, 2007). While these documents, and 
policies such as the America COMPETES Act, endorse an agenda of enriched science curricula and 
improved science teacher training, they have not impacted the practices of state departments of 
education calculating AYP. 

The intent of this study was to investigate effects of allowing more than the two subject 
areas of mathematics and reading to be part of accountability calculations. In comparing the few 
states that integrate science into their accountability calculations with all other states, two research 
questions were pursued: 

1. Does science achievement of students differ between states that use science as part of their 
accountability calculations and states that do not use science in their accountability 
calculations? 

2. Does mathematics achievement and reading achievement of students differ between states 
that use science as part of their accountability calculations and states that do not use science 
in their accountability calculations? 

Background 

It can be generally agreed that a primary intention of the NCLB legislation was for all 
children to achieve specific proficiency standards in mathematics and reading. This goal was 
supported through several key provisions of the NCLB Act such as requiring schools that have not 
made AYP for two consecutive years to provide opportunities for their students to receive extra 
academic assistance and promoting teacher quality by ensuring teachers meet state certification 
requirements (Shaul & Ganson, 2005). Although it is not difficult to find commentaries and studies 
that contest the value of NCLB (e.g., Amrein-Beardsley, 2009), there does exist lukewarm support 
for the efficacy of NCLB in the research literature. Using estimates of accountability pressure among 
the states, Nichols, Glass, and Berliner (2006) found no relationship between accountability pressure 
with later cohort mathematics achievement on National Assessment of Educational Progress 
(NAEP) results at the fourth- and eighth-grade levels; however, the researchers did find a causal 
relationship between high-stakes testing pressure and subsequent achievement on non-cohort 
fourth-grade mathematics achievement (i.e., comparing achievement to a subsequent year’s 
achievement from the same grade level). Nichols, Glass, and Berliner attributed this effect as likely 
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due to increased time spent on mathematics instruction. Also, using a rating scale to approximate 
states’ strength of accountability, Carnoy and Loeb (2002) found substantial gains in eighth-grade 
mathematics scores when states raised external pressure on schools. Finally, in a comprehensive 
study of the effects of external accountability on student achievement, Lee (2008) examined the 76 
effect size estimates from 14 large-scale studies. Results of Lee’s meta-analysis showed modest 
positive policy effects on average, but the study did not address possible shifting of resources or 
attention from one subject area to another when accountability pressures mount. 

Although a scholarly debate may persist for years to come regarding the value of policies 
that provide sanctions and rewards based largely on results from once-a-year multiple-choice tests, 
teachers and school administrators do not have the luxury of dismissing the NCLB related 
requirements that have been translated and set down by their state departments of education. While 
it is known that achievement of both high and low performing students can be affected by multiple 
variables, such as teacher quality (Darling-Hammond, 2000) and parental influence (Davis-Kean, 
2005), there remains the question of what effect the distal variable of an accountability program has 
on different levels of student performance. Reback (2008) found that when school personnel believe 
the goal of attaining proficiency is achievable they are quick to respond to looming interventions and 
students at the lowest levels of performance make greater than expected gains in mathematics, yet 
high achieving students do not make similar gains in mathematics. However, this type of 
differentiated improvement may be due to school personnel attending to achievement in an 
educational triage manner - with students who are nearly passing or failing high stakes exams 
receiving greatest attention. Reback concluded that accountability incentives could influence 
achievement across subjects and across grades: 

If a school has a relatively strong incentive to improve students' math performance 
in a particular grade, then the lowest achieving students in that grade outperform 
similar schoolmates. The other students in that grade, however, perform worse than 
similar schoolmates in the other grades, (unless their own performance is relatively 
important for the school's rating). If a school has a relatively strong incentive to 
improve some students' reading performance in a particular grade, then other 
students in this grade perform much worse than similar schoolmates. The findings 
are again consistent with schools sacrificing general performance in a classroom to 
focus on the performance of particular students, (p. 1411) 

In support of this idea that accountability policies can lead to a shifting of resources, in a case study 
of one elementary school, Booher-Jennings (2005) also concluded that teachers’ response to 
potential accountability consequences led to a focus on students close to performance thresholds 
and to diminished attention for other students. Similarly, Diamond and Spillane (2004) analyzed data 
from observations and interviews and inferred that low performing schools had a more limited 
focus on improving achievement among a narrow band of students who were at or near 
performance levels. This narrowed focus was also found within the Chicago Public Schools where 
Neal and Schanzenbach (2010) used data from standardized tests to learn that students in the third 
and fourth deciles of prior achievement made greater than expected gains in both mathematics and 
reading, while students in the first and second deciles remained stagnant. 

However, other researchers have found that although failing schools may target students 
who are on the boundary of meeting performance expectations, this does not occur at the expense 
of the higher performing students enrolled at the same schools (Springer, 2007). Adding to the 
mixed results of the NCLB effect, Ballou and Springer (2008) found that improvements experienced 
at schools do not necessarily come at the cost of affecting high-performing students. In fact, they 
found pressured schools tend to increase achievement in most grades and not just in the grades 
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where low achievement had led to the schools missing performance targets. Furthermore, the 
findings of these researchers cast doubt on the notion that NCLB related gains are largest among 
schools most likely to face sanctions; in fact, Ballou and Springer found the opposite: that response 
to NCLB has been greatest among schools that are least threatened by interventions due to low test 
scores. 

While the aforementioned research literature addresses how resources and attention can be 
transacted between the subjects of reading and mathematics, across grades, and among student 
groups, less has been studied regarding the effects on other subjects, namely science. There was 
optimism earlier in the history of NCLB that the requirement of states to assess science in at least 
three grades would actually lead to an increased focus on the subject (Cavanagh, 2007). However, a 
review of further reports leads to the conclusion that any early optimism turned to concern about 
science being relegated to a subject less important than reading or mathematics, especially in the 
elementary grades. Although it has been reported that elementary teachers describe their beliefs 
about teaching science to be unchanged by NCLB, and that they possess a generally positive attitude 
(Milner, Sondergeld, Demir, Johnson, & Czerniak, 2011), there are descriptions of schools 
diminishing the amount of time spent on science that have been reported repeatedly (Kingsbury, 
2007; Linn, 2008; McMurrer, 2007; McMurrer, 2008). As might be expected, school and district 
personnel have reported cuts in time spent on science, as well as arts, social studies, physical 
education, and even lunch and recess. These cuts have been attributed to a shared perception for the 
need to spend more time on mathematics and reading. In fact, analysis of data from elementary 
schools found that in light of NCLB mandates, science was cut by at least 75 minutes per week in at 
least half of the reporting districts (McMurrer, 2008). 

Yet, as discussed, accountability programs are not identical among the states and a few states 
do require science achievement to be part of calculations when determining if schools are meeting 
or not meeting targets necessary to avoid intervention from their state departments of education. In 
a comparison of fourth-grade and eighth-grade 2005 NAEP science achievement results, states were 
grouped based on whether they did or did not include science in their accountability calculations 
(Judson, 2010). Results of this study revealed that there was no appreciable difference between the 
groups of states when comparing eighth-grade NAEP science results. However, analysis of the 
fourth-grade NAEP science results revealed there were significant differences in favor of states that 
used science in their accountability programs. The medium effect size of the difference in fourth- 
grade results between the groups of states can on the one hand be taken as support for including 
science into accountability formulas. On the other hand, missing from this study was an examination 
of what was simultaneously occurring to achievement in mathematics and reading. If the states that 
use science in their accountability programs are shown to have significantly higher science 
achievement than other states on a common assessment but are also found to have inferior 
achievement on mathematics or reading achievement, then the argument can be made that allowing 
a third subject to pass through the accountability membrane leads to diffused results across the 
other high stakes content areas. The intent of this study was to then pick up where this earlier study 
left off. Using the more recent 2009 NAEP science achievement data, comparisons were to be made 
between states that choose to use and not use science in their accountability programs. Additionally, 
the analysis here would go further and examine if differences between these groups of states could 
be detected in the mathematics and reading achievement results of their fourth- and eighth-grade 
students. 
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Methods 


Categorizing the States 

The states were grouped into three categories based on how they use or do not use 
science achievement in their accountability program calculations. Although all of the states 
assess science achievement in at least three grades, only a few have mandated that the results 
from their state science assessments contribute to the determinations of whether schools and 
LEAs are meeting accountability benchmarks. Choosing to use science as part of accountability 
programs may be done in one of two ways. One way is for a state to directly integrate science 
achievement results into the federally required AYP calculations. Although the states are 
required to use high school graduation rate as a variable when determining if secondary schools 
are meeting expectations, the states are allowed to decide what variable they will use as an 
additional indicator when calculating AYP for the elementary grades. The large majority of states 
have selected attendance as their additional indicator for the elementary grades. However, a few 
states, such as New York, use science achievement as their additional indicator when calculating 
AYP. 

The second way that science achievement can contribute to accountability is when it is 
part of a second accountability system that is parallel to the NCLB required AYP-based 
accountability system. Because most states did have some form of accountability and reporting 
in place prior to the commencement of NCLB legislation, several of those states have chosen to 
continue with, and adapt, their previous accountability systems. At present there are fifteen 
states that have dual accountability programs (i.e., the AYP accountability program plus a state 
accountability program) in place (Blank & Hovanetz, 2009). Among these fifteen states, a few 
require that, as part of their state accountability program, science achievement be included in the 
calculations. For example, within Utah’s U-PASS accountability system, science achievement 
contributes 20 percent of a school’s proficiency rating. For this study, to be labeled as a state 
that “uses” science in their dual accountability program, it was determined that this parallel 
accountability system needed to carry potential penalties when targets are not met, just as the 
AYP-based accountability program does. In other words, if the dual accountability program 
simply assessed and reported schools’ status, but did not have the weight of possible 
intervention when schools failed, then that was viewed equivalent to the predominant AYP- 
based system in which states report science results but those results do not contribute to 
computing whether or not a school or LEA is subject to sanctions. 

The states were categorized into three groups. Group 1 was comprised of the thirty-nine 
states that do not use science achievement in their accountability calculations. The definitions 
for the second and third groups of states were based on (a) the degree to which science is 
required as an accountability variable, and (b) the grade levels assessed by the states. Each of 
these criteria is clarified here. Among the eleven states that integrate science achievement results 
into accountability calculations, some of those states allow schools to select science from a 
menu of choices. This is the case in Georgia where science may be chosen as the additional 
indicator for AYP by the elementary schools, but most often the schools select to use 
attendance rate. If a state allows for science to be used in accountability, but does not require it, 
that state was placed in Group 2. 

Because the NAEP science assessment achievement results were used in this study as the 
dependent variable when comparing states, the definitions for the second and third groups were 
also influenced by the grades tested by NAEP. That is, while some states do require that science 
achievement be included in accountability, the grades from which they require those results do 
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not always match the grades assessed by NAEP. NAEP tests science in only fourth- and eighth- 
grade. If a state required science to be included in accountability but did not require results from 
fourth- or eighth-grade, the state was placed in Group 2. There were cases such as that of North 
Carolina that requires fifth- and eighth-grade science results to be used in their accountability 
programs. In this instance, North Carolina matches only one NAEP grade. Therefore, North 
Carolina was placed in Group 2 for fourth-grade categorization and in Group 3 for eighth-grade 
categorization. Group 2 are the states that have either partially committed to using science in 
accountability by making it a choice or do require the use of science achievement in 
accountability, but do not require those results from a NAEP tested grade (i.e., fourth- or 
eighth-grade). The rationale for creating the category of Group 2, as opposed to aggregating 
these states with the states that required the use of science achievement results and did match 
the NAEP grades, was to maintain a clear focus on possible effects of a stricter definition of 
using science achievement in accountability. The Group 2 states were not coupled with the 
Group 1 states because there was interest to see if perhaps a spillover effect was in play in states 
that had made some movement in the direction of using science achievement in accountability. 
Group 3 are those states that require science to be used in accountability calculations and 
require achievement results from a grade matching a NAEP tested grade. The groupings of the 
states are provided in Table 1. 

Table 1 


Categories o f States Based on Use of Science Achievement in Accountability Calculations 



4 th Grade 

8 th Grade 

Group 1 — Do not include 
science in accountability. 

All other states 

All other states 

Group 2 — Include science in 
accountability, but either as a 
choice or do not use results 
from the NAEP tested grade 

CA, GA, MI, NC, OH, VA 

GA, KY 

Group 3 — Require science in 
accountability from the 

NAEP tested grade 

KY, NY, SC, TN, UT 

CA, MI, NY, NC, OH, SC, 

TN, UT, VA 


Use of NAEP Data 

As mentioned, NAEP data were utilized to address the research questions. The first 
research question is an inquiry of whether the process of including science into accountability 
calculations affects student achievement. NAEP science achievement results from 2000 and 
2009 were available and considered well suited to address the first research question. In 
reauthorizing the Elementary and Secondary Education Act (ESEA), NCLB legislation was 
passed in 2001 and began to take effect in 2002 (No Child Left Behind Act, 2002). The NAEP 
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data from 2000 then provided a glimpse into pre-accountability austerity. Because the 
framework for the NAEP science achievement assessment was different in 2009 than it was in 
2000 (National Assessment Governing Board, 2008), scale scores from the two years could not 
be strictly compared. Although the science framework had changed, certainly both the 2000 and 
2009 assessments were built on the construct of science and the 2000 achievement results could 
be used as covariate variables when comparing 2009 fourth- and eighth-grade results across 
Groups 1, 2, and 3. 

A point of clarification must be made regarding the use of NAEP data for this study that 
investigated how differing AYP practices among the states might yield dissimilar results on 
NAEP assessments. It is important to note that NAEP assessment results are not integrated 
into any AYP calculations and are not used as proxies for states’ high stakes accountability tests. 
Studies have demonstrated that the percentage of students scoring proficient on state 
examinations can be extremely different from the percent of students scoring at Proficient level 
on NAEP (Stoneberg, 2007) and therefore using NAEP data when examining AYP practices 
may warrant caution. For example, although overall NAEP results have remained relatively 
stable over time, student proficiency on state assessments has increased in some states (Jacob, 
2007). A cause of this discrepancy is likely due to the evidence revealed in a state level 
examination of proficiency standards conducted by the National Center for Education Statistics 
(NCES) indicating that for approximately half of the cases, examined in grades 4 and 8, the rigor 
of states’ standards had decreased between 2005 and 2009 (Bandeira de Mello, 2011). However, 
this lack of corroboration between NAEP results and the proportion of students meeting states’ 
proficiency standards was not viewed as a deterrent to this study. Within this study NAEP was 
not used as confirmatory evidence of states’ assessment results; rather, NAEP was utilized only 
as a basis of comparison of achievement in the content areas of science, mathematics, and 
reading, as NAEP “offers the most reliable and equitable measures of student achievement 
across states” (Nicholas, 2005, p. 2). That is, it was not an objective to determine if states’ 
methods of determining proportions of proficient students on their state assessments correlated 
with NAEP. Instead, the intent was to determine if the variable of including science results from 
state assessments in state-level accountability practices influenced achievement, as detected by 
NAEP. 

To be consistent with the selection of the science NAEP analysis, mathematics and 
reading NAEP data were selected from 2009 and a pre-NCLB year. Similar to science, 
mathematics had been assessed in fourth- and eighth-grade in both 2000 and 2009. Reading had 
been assessed in both fourth- and eighth-grade in 1998 and again in 2009. The use of these data 
allowed the second research question to be addressed. If the inclusion of science into 
accountability programs reduced attention to the core subjects of mathematics and reading, then 
there was anticipation of a negative impact on mathematics and reading achievement in the 
Group 3 states. 

Data Analysis 

A series of analysis of covariance (ANCOVA) were conducted for the three subject areas 
of science, mathematics, and reading using the pre-NCLB NAEP mean scale scores as 
covariates and the 2009 NAEP mean scale scores as the dependent variables. The ANCOVA 
analyses were conducted for each of the three subjects of reading, mathematics, and science and 
for both fourth- and eighth-grade data. The use of ANCOVA analysis allowed the 2009 data to 
be adjusted on the pre-NCLB covariate data; this procedure yields adjusted means, also referred 
to as estimated marginal means, on the 2009 scale scores “as if’ all groups had equivalent scale 



Education Policy Analysis Archives Vol. 20 No. 26 


10 


scores in the pre-NCLB year. Significant omnibus F-test results were followed up with Fisher 
least significant difference (LSD) post-hoc comparisons. 

NAEP achievement data from 1998 were used as the covariate when analyzing 
differences among the groups in reading and NAEP achievement data from 2000 were used as 
the covariates when analyzing differences among the groups in mathematics and science. To be 
included in the analyzed dataset, states needed to have reported NAEP data from the pre-NCLB 
year and from 2009. This requirement eliminated some states from the study. Fourth-grade data 
were available from 39 states for reading, 40 states for mathematics, and 37 states for science. 
Eighth-grade data were available from 36 states for reading, 39 states for mathematics, and 36 
states for science. 

Data were analyzed for all students and were also disaggregated and analyzed based on 
socioeconomic status (SES). To varying degrees, SES status has been shown to be related to 
academic achievement in multiple subject areas (e.g., McGraw, Lubienski, & Strutchens, 2006; 
Perry & McConney, 2010; Stipek & Ryan, 1997, Willms, 2003). Also of interest was determining 
if effects, attributable to the use of science in state accountability programs, could be detected 
among student groups based on ethnicity. Researchers have examined disparity in achievement 
among ethnic groups on standardized tests that is often related to SES differences among the 
groups (Flores, 2007; Magnuson, Rosenbaum, & Waldfogel, 2008). However, the NAEP data 
were inadequate to examine groups based on ethnicity. Disaggregation of NAEP science data, 
based on ethnicity, provided too few states meeting criteria of the National Assessment 
Governing Board (NAGB), that oversees NAEP administration and reporting, to be useable 
across all ethnic groups. For example, only a total of four states among the eighth-grade data 
met the NAGB criteria for both the pre-NCLB and post-NCLB data years for the groups of 
Asian and American Indian students. A reduced amount of states reported other ethnic groups 
(e.g., Hispanic), but not to the extreme of the example of Asian and American Indian student 
reporting. Because of the requirement that data needed to be available from both the pre-NCLB 
year and from 2009 had already vetted the states to be analyzed by approximately 30 percent, 
analysis remained focused on the broad category of all students and the SES-based categories 
that were generally well reported by the states. 

Results 

Results of ANCOVA analysis from the aggregate category of all students and SES-based 
categories (i.e., eligible and not eligible for free or reduced lunch) are provided for fourth-grade 
in Table 2. 

Separate ANCOVA analysis of all students’ NAEP scale scores for the three fourth-grade 
subject areas revealed that after controlling for the pre-NCLB scale scores (covariates) there were no 
significant differences among the groups in reading or in mathematics. There were, though, 
significant differences in the category of all students among the groups of states in their fourth-grade 
science results, F(2, 34) = 4.831 ,p < .05. A significant difference among the groups of states 
persisted when evaluating the fourth-grade data in the categories of students eligible for free or 
reduced lunch, F(2, 34) = 3.639 ,p < .05, and students not eligible for free or reduced lunch, F(2, 34) 
= 4.286,/) < .05. 
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Table 2 

Fourth-grade NAEP Pre- and Post-NCFB, All Students and SES-based Groups 


Subject 

Students 

Group 

n 

Pre-NCLB 

M SD 

M 

Post-NCLB 

SD 

EMM 

F 

P 

Reading 

All 

1 

29 

214.7 

8.2 

220.0 

6.9 

219.7 





2 

5 

211.4 

6.0 

218.3 

5.9 

220.1 

.090 

.914 



3 

5 

214.0 

3.6 

220.4 

4.4 

220.5 




Eligible 

1 

29 

199.0 

8.0 

207.1 

5.4 

206.9 




free/reduced 

2 

5 

194.3 

7.3 

204.3 

5.3 

205.8 

.347 

.709 



3 

5 

199.8 

5.4 

208.8 

5.3 

208.3 




Not eligible 

1 

29 

225.3 

5.7 

231.0 

5.3 

230.8 




free/reduced 

2 

5 

223.2 

3.1 

230.8 

3.2 

231.9 

.152 

.859 



3 

5 

225.5 

3.9 

231.2 

4.3 

230.9 



Math 

All 

1 

29 

224.9 

6.6 

239.4 

6.5 

239.2 





2 

6 

225.1 

7.4 

239.1 

5.1 

238.7 

.066 

.936 



3 

5 

222.2 

3.5 

237.5 

3.7 

239.4 




Eligible 

1 

29 

212.5 

6.0 

228.9 

5.6 

228.5 




free/reduced 

2 

6 

210.7 

6.8 

226.5 

4.9 

227.2 

.403 

.671 



3 

5 

209.1 

4.1 

227.2 

3.9 

229.0 




Not eligible 

1 

29 

234.1 

4.5 

248.6 

4.8 

248.7 




free/reduced 

2 

6 

235.2 

4.4 

250.0 

3.4 

249.2 

.299 

.743 



3 

5 

233.3 

3.0 

247.2 

3.0 

247.8 



Science 

All 

1 

26 

149.4 

8.3 

149.9 

7.8 

149.3 





2 

6 

146.6 

10.1 

149.6 

9.1 

151.5 

4.831 

.014* 



3 

5 

147.8 

5.5 

152.0 

5.5 

152.9 




Eligible 

1 

26 

135.9 

9.1 

136.7 

7.6 

135.7 




free/reduced 

2 

6 

128.9 

10.0 

134.0 

7.7 

138.2 

3.639 

.037* 



3 

5 

134.4 

5.9 

139.2 

6.3 

139.4 




Not eligible 

1 

26 

160.0 

5.2 

161.7 

5.6 

161.5 




free/reduced 

2 

6 

159.3 

5.1 

163.0 

6.2 

163.6 

4.286 

.022* 



3 

5 

159.2 

2.8 

164.6 

4.5 

165.2 




EMM: Estimated Marginal Mean 
*p < .05 


Fisher LSD comparisons were conducted to determine differences between the groups in 
fourth-grade science achievement. The post hoc tests revealed significant differences in fourth-grade 
between Group 1 (i.e., states not including science in accountability) and Group 3 (i.e., states that 
require science in accountability),^ < .05, for the categories of all students, students eligible for free 
or reduced lunch, and students not eligible for free or reduced lunch. In all three cases. Group 3 had 
significandy higher mean scale science scores. Fourth-grade science achievement was significantly 
higher in states where science was required to be part of an accountability program. There were no 
significant differences in the fourth-grade data between Group 2 and either Group 1 or Group 3 for 
the all students and SES-based categories. 
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Partial eta-squared ( p ") measurements were used to determine effect sizes in which small, 
medium, and large effects were operationalized as .01, .06, and .14, respectively (Stevens, 1992). The 
effect of using science in accountability programs on the 2009 mean NAEP science scale scores was 
considered large for the fourth grade categories of all students ( “ = .226), students eligible for free 

or reduced lunch ( p 2 = .181), and students not eligible for free or reduced lunch ( p 2 = .206). 

The eighth-grade data were similarly analyzed. Table 3 provides results of the ANCOVA 
analyses of the three groups of states across the subjects of reading, mathematics, and science. 

Table 3 


Eighth-grade NAEP Pre- and Post-NCLB, All Students and SES-based Groups 


Subject 

Students 

Group 

n 

Pre-NCLB 

M SD 

M 

Post-NCLB 

SD 

EMM 

F 

P 

Reading 

All 

1 

27 

261.4 

6.2 

262.5 

6.5 

262.3 





2 

2 

259.8 

3.6 

263.5 

4.7 

264.6 

.468 

.625 



3 

7 

260.3 

5.3 

260.8 

4.8 

261.5 




Eligible 

1 

27 

247.4 

6.0 

250.2 

4.9 

249.8 




free/reduced 

2 

2 

245.4 

7.5 

253.0 

5.4 

253.5 

.971 

.389 



3 

7 

244.0 

5.5 

247.9 

3.9 

249.1 




Not eligible 

1 

27 

268.4 

5.1 

271.2 

5.1 

271.4 




free/reduced 

2 

2 

269.1 

1.7 

273.8 

3.1 

273.5 

1.094 

.347 



3 

7 

269.5 

3.5 

270.5 

2.8 

270.0 



Math 

All 

1 

28 

272.8 

9.2 

282.4 

8.9 

281.9 





2 

2 

267.6 

3.2 

278.4 

1.2 

282.1 

.029 

.972 



3 

9 

271.1 

7.4 

280.7 

5.3 

281.6 




Eligible 

1 

28 

256.4 

9.5 

268.2 

7.0 

267.4 




free/reduced 

2 

2 

250.4 

6.3 

266.2 

2.3 

268.9 

.146 

.865 



3 

9 

251.8 

6.5 

265.5 

4.4 

267.4 




Not eligible 

1 

28 

281.0 

7.3 

292.3 

7.2 

292.1 




free/reduced 

2 

2 

279.1 

1.9 

290.0 

0.5 

291.3 

.054 

.948 



3 

9 

280.2 

5.9 

291.4 

4.4 

291.8 



Science 

All 

1 

25 

148.6 

8.9 

149.5 

8.5 

149.0 





2 

2 

145.8 

5.4 

151.6 

6.7 

153.6 

2.711 

.082 



3 

9 

147.0 

9.3 

149.4 

7.4 

150.3 




Eligible 

1 

25 

132.3 

10.3 

135.6 

8.5 

134.8 




free/reduced 

2 

2 

129.1 

oo 

oo 

140.1 

10.3 

141.7 

4.269 

.023* 



3 

9 

128.6 

9.2 

134.2 

7.1 

136.2 




Not eligible 

1 

25 

157.2 

6.5 

159.5 

6.1 

159.3 




free/reduced 

2 

2 

156.5 

3.5 

162.9 

2.9 

163.1 

3.556 

.040* 



3 

9 

155.8 

7.5 

160.6 

4.2 

161.3 




EMM: Estimated Marginal Mean 
*p < .05 


Similar to the fourth-grade analysis, the only significant differences among the three 
groups were found in the subject of science. There were no significant differences among 
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groups in either eighth-grade reading or eighth-grade mathematics data when comparing 2009 
mean scale scores, with the pre-NCLB scores used as the covariates. Unlike the fourth-grade 
results, the eighth-grade results in science did not align neatly with the hypothesis that inclusion 
of science into accountability programs will lead to greater achievement. The omnibus F tests 
did not reveal a significant effect in the category of all students. However, there was a significant 
difference among the groups in the categories of students eligible for free or reduced lunch, F(2, 
33) = 4.269 ,p <.05, and students not eligible for free or reduced lunch, F(2, 33) = 3.556 ,p <.05. 
Within the category of students eligible for free or reduced lunch, the significant differences 
were between Group 1 and Group 2 (p — .008), and between Group 2 and Group 3 (p — .043). 
Group 2 are those states that include science in some manner in their accountability programs, 
but either offer this as an option for their schools or do not use results from the NAEP grade 
being assessed - in this case, eighth-grade. Here, adjusting for the covariate 2000 NAEP data, it 
was the two states of Georgia and Kentucky, comprising Group 2, with significantly higher 
mean scale scores on the 2009 NAEP science assessment than the two other groups of states. 
Regarding the category of students not eligible for free or reduced lunch, although the overall 
model demonstrated significance (p — .040), the post-hoc tests did not reveal significant 
differences between groups of states at the criteria level, p < .05. 

Discussion 

In the presentation of the imagined accountability filtration system (Figure 1) the 
supposition was posed that allowing the third subject of science to pass through the membrane and 
be included in high-stakes accountability would lead to diffuse attention across mathematics and 
reading, and consequently lead to relatively lower achievement in those subjects. The data presented 
here does not support this supposition. In both fourth- and eighth-grade, 2009 NAEP mathematics 
and reading achievement scores were equivalent among the groups of states. What was different 
across the three groups of states was their 2009 NAEP science achievement. 

The fourth-grade data are supportive of a hypothesis indicating that the inclusion of 
students’ science achievement results into accountability calculations will promote higher 
achievement. Of course, these state level data provide only a mile high view of learning and do not 
indicate what fourth-grade practices and informal policies may be different among the groups of 
states. However, the results from fourth-grade data are consistent with the previous study of 2005 
NAEP data (Judson, 2010). That earlier study can be viewed as an intermediate data inspection 
wherein 2000 and 2005 NAEP science data were compared and, as is the case here, the hypothesis 
that inclusion of science results into accountability formulas will promote science achievement was 
supported only at the fourth-grade. Yet this study additionally demonstrates that states venturing to 
include science are not losing a step in reading or mathematics. This latter finding may at first seem 
to run counter to commonsensical logic that in order to attend well to an additional subject, some 
resources must be drawn from mathematics and/or reading. It is offered that this line of thought 
might be too limited and that schools in the Group 3 states are not simply robbing Peter in order to 
pay Paul - or stealing resources from mathematics and reading, in order to attend to science, as is 
the case here. Instead, further investigation must consider if fourth-grade classrooms in the Group 3 
states are incorporating science through an overall enriched curriculum. Integration of science with 
mathematics and with literacy has been shown to have benefits across all of these subjects, so the 
question is now raised if any form of interdisciplinary curriculum is more prevalent among the 
Group 3 states in their fourth-grade classrooms. Of course, what must also be considered is that a 
shifting of resources is occurring in the Group 3 states’ fourth-grade classrooms, but not at the cost 
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of reading or mathematics, but at the cost of the subjects still left on the left side of the 
accountability membrane, such as physical education and art. Prior research has been presented 
indicating that non-high stakes subjects may be receiving less attention, so while these fourth-grade 
data can be hailed as support for including science into accountability calculations, further analysis is 
needed to determine if other subjects have been even more disregarded. 

Less straightforward to scrutinize are the eighth-grade results. The eighth-grade data were 
similar to the fourth-grade data in that the 2009 NAEP mathematics and reading achievement were 
equivalent among the groups of states. However, unlike the fourth-grade date, the eighth-grade data 
did not reveal Group 3 states to have greater relative achievement on the 2009 NAEP science 
assessment. While it might be thinly argued that some positive effect was found in the Group 2 
states, namely Kentucky and Georgia, because plausibly there is some spillover effect into eighth- 
grade science results when these states allow schools to choose to include science into their 
accountability formulas, it is doubtful that such an effect would be detected only in the Group 2 
states. More likely what is occurring is a lack of a science accountability effect registering in eighth- 
grade. It is believed that the effect found in the fourth-grade data, but not the eighth-grade data, is 
likely due to the nature of science characteristically taught in these grades. Across the United States, 
far more often than not, eighth-grade students have multiple teachers and their schedules provide 
allotted instructional time for their courses. Although anecdotes exist of eighth-grade science 
teachers being instructed by school administrators to stop teaching science days before a state’s high 
stakes test so as to help drill students on mathematics skills, generally in eighth-grade there is 
blocked and defined time for science. This is not necessarily the case in fourth-grade. In a typical 
elementary school, where fourth-grade teachers must teach multiple subjects, the terrain of the 
curriculum can be more flexible. This may mean that more resources are devoted to providing 
professional development for teachers to improve instructional practices in the high-stakes subjects. 
It may also simply mean that more time is devoted to those high-stakes subjects. Further 
investigation of the amount of time spent on science in the three groups of states would further this 
line of reasoning. 

Regarding implications for policymakers, this study may have weight when members of 
Congress consider reauthorization of the Elementary and Secondary Education Act (ESEA) or 
when state decision-makers determine revisions of their state accountability policies. There have 
been past attempts to require science results be used in AYP calculations. The Science 
Accountability Act (2006, 2007, 2009) has been introduced in Congress three times in attempts to 
include science in AYP calculations, but each time the bill has failed to make it out of committee. 
The National Science Teachers Association (NSTA) organized 61 organizations to back the Make 
Science Count petition (2007) to Congress and in 2011 the K-12 STEM Education Policy 
Conference included among its key talking points with Congressional members the imperative to 
include science on par with mathematics and reading when ESEA is reauthorized. There have been 
other recent recommendations to include science into accountability, such as that from the National 
Research Council (NRC) (2011), in which it was recommended that “policy makers at the national, 
state, and local levels should elevate science to the same level of importance as reading and 
mathematics.” 

Science education is in the midst of its next wave of change. The recently released 
Framework for K-12 Science Education (National Research Council, 2012) provides the guidelines 
from which will emerge the Next Generation Science Standards, pending to be released in 2012 
(Robelen, 2011). This new conceptual framework lays down a valuable foundation for teaching and 
learning science that will steer science standards and consequently classroom curriculum. In all 
likelihood, a majority of the states will adopt the new science standards and revisit their science 
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assessments. Yet, the impact on accountability policy is unknown. Sensibly, this would be an 
occasion for states to simultaneously re-examine their accountability policies; the time is ripe for 
policymakers to deliberate on research findings as they make their decisions. Hopefully this study is 
among those included in the register of informative studies used by policymakers. 
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